145 research outputs found
Generator-Retriever-Generator: A Novel Approach to Open-domain Question Answering
Open-domain question answering (QA) tasks usually require the retrieval of
relevant information from a large corpus to generate accurate answers. We
propose a novel approach called Generator-Retriever-Generator (GRG) that
combines document retrieval techniques with a large language model (LLM), by
first prompting the model to generate contextual documents based on a given
question. In parallel, a dual-encoder network retrieves documents that are
relevant to the question from an external corpus. The generated and retrieved
documents are then passed to the second LLM, which generates the final answer.
By combining document retrieval and LLM generation, our approach addresses the
challenges of open-domain QA, such as generating informative and contextually
relevant answers. GRG outperforms the state-of-the-art generate-then-read and
retrieve-then-read pipelines (GENREAD and RFiD) improving their performance at
least by +5.2, +4.2, and +1.6 on TriviaQA, NQ, and WebQ datasets, respectively.
We provide code, datasets, and checkpoints
\footnote{\url{https://github.com/abdoelsayed2016/GRG}
Citation recommendation: approaches and datasets
Citation recommendation describes the task of recommending citations for a given text. Due to the overload of published scientific works in recent years on the one hand, and the need to cite the most appropriate publications when writing scientific texts on the other hand, citation recommendation has emerged as an important research topic. In recent years, several approaches and evaluation data sets have been presented. However, to the best of our knowledge, no literature survey has been conducted explicitly on citation recommendation. In this article, we give a thorough introduction to automatic citation recommendation research. We then present an overview of the approaches and data sets for citation recommendation and identify differences and commonalities using various dimensions. Last but not least, we shed light on the evaluation methods and outline general challenges in the evaluation and how to meet them. We restrict ourselves to citation recommendation for scientific publications, as this document type has been studied the most in this area. However, many of the observations and discussions included in this survey are also applicable to other types of text, such as news articles and encyclopedic articles
Citation Recommendation: Approaches and Datasets
Citation recommendation describes the task of recommending citations for a
given text. Due to the overload of published scientific works in recent years
on the one hand, and the need to cite the most appropriate publications when
writing scientific texts on the other hand, citation recommendation has emerged
as an important research topic. In recent years, several approaches and
evaluation data sets have been presented. However, to the best of our
knowledge, no literature survey has been conducted explicitly on citation
recommendation. In this article, we give a thorough introduction into automatic
citation recommendation research. We then present an overview of the approaches
and data sets for citation recommendation and identify differences and
commonalities using various dimensions. Last but not least, we shed light on
the evaluation methods, and outline general challenges in the evaluation and
how to meet them. We restrict ourselves to citation recommendation for
scientific publications, as this document type has been studied the most in
this area. However, many of the observations and discussions included in this
survey are also applicable to other types of text, such as news articles and
encyclopedic articles.Comment: to be published in the International Journal on Digital Librarie
Exploring the State of the Art in Legal QA Systems
Answering questions related to the legal domain is a complex task, primarily
due to the intricate nature and diverse range of legal document systems.
Providing an accurate answer to a legal query typically necessitates
specialized knowledge in the relevant domain, which makes this task all the
more challenging, even for human experts. QA (Question answering systems) are
designed to generate answers to questions asked in human languages. They use
natural language processing to understand questions and search through
information to find relevant answers. QA has various practical applications,
including customer service, education, research, and cross-lingual
communication. However, they face challenges such as improving natural language
understanding and handling complex and ambiguous questions. Answering questions
related to the legal domain is a complex task, primarily due to the intricate
nature and diverse range of legal document systems. Providing an accurate
answer to a legal query typically necessitates specialized knowledge in the
relevant domain, which makes this task all the more challenging, even for human
experts. At this time, there is a lack of surveys that discuss legal question
answering. To address this problem, we provide a comprehensive survey that
reviews 14 benchmark datasets for question-answering in the legal field as well
as presents a comprehensive review of the state-of-the-art Legal Question
Answering deep learning models. We cover the different architectures and
techniques used in these studies and the performance and limitations of these
models. Moreover, we have established a public GitHub repository where we
regularly upload the most recent articles, open data, and source code. The
repository is available at:
\url{https://github.com/abdoelsayed2016/Legal-Question-Answering-Review}
ScholarSight: Visualizing Temporal Trends of Scientific Concepts
2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL): June 2 2019 to June 6 2019 Champaign, IL, USA.In this paper, we present a system for exploring the temporal trends of scientific concepts. Scientific concepts were captured by extracting noun phrases and entities from all computer science papers of arXiv.org. Our system allows users to review the time series of numerous concepts and to identify positively and negatively trending concepts. By applying clustering techniques and cluster analysis visualizations, it can also present concepts which share the same usage patterns over time. Our system can be beneficial for both ordinary researchers of any field and for researchers working in bibliometrics and scientometrics in order to investigate the evolution of scientific concepts
Dataset for Temporal Analysis of English-French Cognates
Languages change over time and, thanks to the abundance of digital corpora, their evolutionary analysis using computational techniques has recently gained much research attention. In this paper, we focus on creating a dataset to support investigating the similarity in evolution between different languages. We look in particular into the similarities and differences between the use of corresponding words across time in English and French, two languages from different linguistic families yet with shared syntax and close contact. For this we select a set of cognates in both languages and study their frequency changes and correlations over time. We propose a new dataset for computational approaches of synchronized diachronic investigation of language pairs, and subsequently show novel findings stemming from the cognate-focused diachronic comparison of the two chosen languages. To the best of our knowledge, the present study is the first in the literature to use computational approaches and large data to make a cross-language diachronic analysis.Peer reviewe
A city-wide examination of fine-grained human emotions through social media analysis.
The proliferation of Social Media and Open Web data has provided researchers with a unique opportunity to better understand human behavior at different levels. In this paper, we show how data from Open Street Map and Twitter could be analyzed and used to portray detailed Human Emotions at a city wide level in two cities, San Francisco and London. Neural Network classifiers for fine-grained emotions were developed, tested and used to detect emotions from tweets in the two cites. The detected emotions were then matched to key locations extracted from Open Street Map. Through an analysis of the resulting data set, we highlight the effect different days, locations and POI neighborhoods have on the expression of human emotions in the cities
- …